Backoff Model Training using Partially Observed Data: Application to Dialog Act Tagging
نویسندگان
چکیده
Dialog act (DA) tags are useful for many applications in natural language processing and automatic speech recognition. In this work, we introduce hidden backoff models (HBMs) where a large generalized backoff model is trained, using an embedded expectation-maximization (EM) procedure, on data that is partially observed. We use HBMs as word models conditioned on both DAs and (hidden) DAsegments. Experimental results on the ICSI meeting recorder dialog act corpus show that our procedure can strictly increase likelihood on training data and can effectively reduce errors on test data. In the best case, test error can be reduced by 6.1% relative to our baseline, an improvement on previously reported models that also use prosody. We also compare with our own prosody-based model, and show that our HBM is competitive even without the use of prosody. We have not yet succeeded, however, in combining the benefits of both prosody and the HBM.
منابع مشابه
Training a prosody-based dialog act tagger from unlabeled data
Dialog act tagging is an important step toward speech understanding, yet training such taggers usually requires large amounts of data labeled by linguistic experts. Here we investigate the use of unlabeled data for training HMM-based dialog act taggers. Three techniques are shown to be effective for bootstrapping a tagger from very small amounts of labeled data: iterative relabeling and retrain...
متن کاملAutomatic Dialog Act Labeling with Minimal Supervision
ABSTRACT: For many natural language applications it is desirable to be able to automatically tag utterances according to their discourse function (dialog act), such as statement, question or acknowledgment. We investigate the problem of automatically tagging dialog acts when handlabeled training data is scarce. The tagging paradigm employed is a hidden Markov model in which dialog acts are stat...
متن کاملCascaded model adaptation for dialog act segmentation and tagging
There are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how importan...
متن کاملAnalyzing and Predicting Patterns of DAMSL Utterance Tags
We have been annotating TRAINS dialogs with dialog acts in order to produce training data for a dialog act predictor, and to study how language is used in these dialogs. We are using DAMSL dialog acts which consist of 15 independent attributes. For the purposes of this paper, infrequent attributes such as Unintelligible and Self-Talk were set aside to concentrate on the eight major DAMSL tag se...
متن کاملAutomatic Dialog Act Corpus Creation from Web Pages
This work presents two complementary tools dedicated to the task of textual corpus creation for linguistic researches. The chosen application domain is automatic dialog acts recognition, but the proposed tools might also be applied to any other research area that is concerned with dialogs processing. The first software captures relevant dialogs from freely available resources on the World Wide ...
متن کامل